{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d792f35b",
   "metadata": {},
   "source": [
    "# Lesson 2: Interacting with a CSV Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6be6d049",
   "metadata": {},
   "source": [
    "## Setup and connect to the Azure OpenAI endpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "399c89a2",
   "metadata": {},
   "source": [
    "**Note**: The pre-configured cloud resource grants you access to the Azure OpenAI GPT model. The key and endpoint provided below are intended for teaching purposes only. Your notebook environment is already set up with the necessary keys, which may differ from those used by the instructor during the filming."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83ac3e59-d0d1-4110-a059-3a55d0d5e15e",
   "metadata": {
    "height": 217
   },
   "outputs": [],
   "source": [
    "import os \n",
    "import pandas as pd\n",
    "\n",
    "from IPython.display import Markdown, HTML, display\n",
    "from langchain.schema import HumanMessage\n",
    "from langchain_openai import AzureChatOpenAI\n",
    "\n",
    "model = AzureChatOpenAI(\n",
    "    openai_api_version=\"2023-05-15\",\n",
    "    azure_deployment=\"gpt-4-1106\",\n",
    "    azure_endpoint=os.getenv(\"AZURE_OPENAI_ENDPOINT\"),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba9075a3",
   "metadata": {},
   "source": [
    "## Load the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb0dc855",
   "metadata": {},
   "source": [
    "**Note**: To access the data locally, use the following code:\n",
    "\n",
    "```\n",
    "os.makedirs(\"data\",exist_ok=True)\n",
    "!wget https://covidtracking.com/data/download/all-states-history.csv -P ./data/\n",
    "file_url = \"./data/all-states-history.csv\"\n",
    "df = pd.read_csv(file_url).fillna(value = 0)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6c75799-55e2-4d39-a1ab-7b968b4f35a4",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"./data/all-states-history.csv\").fillna(value = 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6a261ac",
   "metadata": {},
   "source": [
    "## Prepare the Langchain dataframe agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60d0793b-90e7-4e8b-b2f1-66a8d1cf652d",
   "metadata": {
    "height": 115
   },
   "outputs": [],
   "source": [
    "from langchain.agents.agent_types import AgentType\n",
    "from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent\n",
    "\n",
    "agent = create_pandas_dataframe_agent(llm=model,df=df,verbose=True)\n",
    "\n",
    "agent.invoke(\"how many rows are there?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c6dedce",
   "metadata": {},
   "source": [
    "## Design your prompt and ask your question"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d005008-bad6-4458-90ec-6372d9ebe61c",
   "metadata": {
    "height": 523
   },
   "outputs": [],
   "source": [
    "CSV_PROMPT_PREFIX = \"\"\"\n",
    "First set the pandas display options to show all the columns,\n",
    "get the column names, then answer the question.\n",
    "\"\"\"\n",
    "\n",
    "CSV_PROMPT_SUFFIX = \"\"\"\n",
    "- **ALWAYS** before giving the Final Answer, try another method.\n",
    "Then reflect on the answers of the two methods you did and ask yourself\n",
    "if it answers correctly the original question.\n",
    "If you are not sure, try another method.\n",
    "- If the methods tried do not give the same result,reflect and\n",
    "try again until you have two methods that have the same result.\n",
    "- If you still cannot arrive to a consistent result, say that\n",
    "you are not sure of the answer.\n",
    "- If you are sure of the correct answer, create a beautiful\n",
    "and thorough response using Markdown.\n",
    "- **DO NOT MAKE UP AN ANSWER OR USE PRIOR KNOWLEDGE,\n",
    "ONLY USE THE RESULTS OF THE CALCULATIONS YOU HAVE DONE**.\n",
    "- **ALWAYS**, as part of your \"Final Answer\", explain how you got\n",
    "to the answer on a section that starts with: \"\\n\\nExplanation:\\n\".\n",
    "In the explanation, mention the column names that you used to get\n",
    "to the final answer.\n",
    "\"\"\"\n",
    "\n",
    "QUESTION = \"How may patients were hospitalized during July 2020\" \n",
    "\"in Texas, and nationwide as the total of all states?\"\n",
    "\"Use the hospitalizedIncrease column\" \n",
    "\n",
    "\n",
    "agent.invoke(CSV_PROMPT_PREFIX + QUESTION + CSV_PROMPT_SUFFIX)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
